Picture for Haitao Mi

Haitao Mi

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

Add code
Feb 04, 2026
Viaarxiv icon

Verified Critical Step Optimization for LLM Agents

Add code
Feb 03, 2026
Viaarxiv icon

Exploring Information Seeking Agent Consolidation

Add code
Jan 31, 2026
Viaarxiv icon

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

Add code
Jan 22, 2026
Viaarxiv icon

RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis

Add code
Jan 13, 2026
Viaarxiv icon

DocDancer: Towards Agentic Document-Grounded Information Seeking

Add code
Jan 08, 2026
Viaarxiv icon

Stable and Efficient Single-Rollout RL for Multimodal Reasoning

Add code
Dec 20, 2025
Figure 1 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 2 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 3 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 4 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Viaarxiv icon

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Add code
Dec 17, 2025
Figure 1 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Figure 2 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Figure 3 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Figure 4 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Viaarxiv icon

SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning

Add code
Nov 17, 2025
Viaarxiv icon